An introduction to coding in R

Tom Keaney

Goals

  • Setup R to work as you intend
  • Code in a reproducible, clear style
  • Gain a familiarity with key data wrangling functions
  • Become well versed in creating figures with ggplot
  • Create a document that should help you in the future

Installation

We will run R using quarto documents in R studio

Download the latest versions:

Quarto: turns R into a text editor

  • Code is written in specified code chunks
  • Notes can be written outside code chunks without #
  • Your code can be turned into an elegant report

Setting up

Make your interface look nice:

Projects

File > New project

  • Existing directory: places project in an existing folder
  • New directory: creates new folder
  • Version control: handy if you want to use github

Projects are powerful:

  • R knows where to look for files
  • No need to worry about setting working directories
  • Great for sharing

Quarto

  • Open a quarto document in your new project

  • File > New File > Quarto document

  • Save the document within the project directory (where you already are)

  • Save the _quarto.yml provided in the email within this directory

  • Render the document

Packages

The base version of R can be upgraded with packages

We shall use the tidyverse collection of packages.

#install.packages("tidyverse")
#install.packages("pander")
#install.packages("patchwork")
#install.packages("MetBrewer")

library(tidyverse) # for tidy coding
library(pander) # for nice tables
library(patchwork) # for aligning plots
library(MetBrewer) # for nice colours to use when making figures

The tidyverse

  • A collection of packages

  • All follow the same logic

  • Quite different from base R

  • “Supremely readable”

Rhamphorhynchus muensteri

The dataset

pterosaur_data <-
  read_delim("pterosaur_data.csv", delim = ";")

pterosaur_data
1
Assign a name to the dataset
2
Load the csv file and specify the separator between columns
3
Display the dataset
# A tibble: 138 × 15
   Individual_ID ORBIT SKULL  NECK TRUNK_LENGTH  TAIL HUMERUS RADIUS
           <dbl> <dbl> <dbl> <dbl>        <dbl> <dbl>   <dbl>  <dbl>
 1             1  11      40  21.5         47.5  106.    16.5   26.7
 2             2  10      35  18           NA    112     15.3   26  
 3             3  NA      NA  NA           36     85     NA     NA  
 4             4  NA      36  NA           39    115     17     28  
 5             5  NA      NA  15.5         36.5  100     14.6   23.8
 6             6  12      35  23           40    110     NA     NA  
 7             7  10      35  20           41    106     15.5   24  
 8             8  NA      31  NA           NA     NA     14.5   24  
 9             9  NA      NA  17           36     79     NA     NA  
10            10  12.5    41  23           47    125     19     32  
# ℹ 128 more rows
# ℹ 7 more variables: METACARPAL_4 <dbl>, WING_PHALANX_1 <dbl>,
#   WING_PHALANX_2 <dbl>, WING_PHALANX_3 <dbl>, WING_PHALANX_4 <dbl>,
#   FEMUR <dbl>, TIBIA <dbl>

Some intriguing patterns

The key to tidyverse coding: %>%

This weird symbol is called a pipe

  • You should read this as then

  • do this, then do this…

  • allows you to chain your code

pterosaur_data %>% view()

In english this means: load the pterosaur data then view it in a new tab

The holy trinity

  • select(): order, rename or drop columns

  • filter(): keep or remove specific rows

  • mutate(): create new columns or edit existing ones

If you ever need help with a function, ? is your friend

# An example of how to get some help
?select

select()

Removing columns you aren’t interested in:

pterosaur_data %>% select(Individual_ID, TAIL)

pterosaur_data %>% select(!c(Individual_ID, TAIL))

pterosaur_data %>% select(contains("WING"))

pterosaur_data %>% select(1, 5)
1
the ! reverses the statement
2
contains chooses columns with names that contain a pattern
3
Dangerous coding! Avoid.

Changing column names:

pterosaur_data %>% select(Specimen = Individual_ID)

# if you want to keep all other columns

pterosaur_data %>% select(Specimen = Individual_ID, everything())

# a recommended alternative

pterosaur_data %>% rename(Specimen = Individual_ID)

select() use cases

  1. Create a new dataset that only contains the ID of the individual and wing measurements for phalanxs 2, 3 and 4.
  1. Returning to the original data, remove the measurements for wing phalanx 2 and 4
  1. Why does this cause an error?
pterosaur_data %>% select(contains(WING))

filter()

Choosing rows of interest

Large_data <- pterosaur_data %>% filter(TAIL > 200)

ten_cm_tails <- pterosaur_data %>% filter(TAIL == 100)

long_tails_and_small_heads <- 
  pterosaur_data %>% filter(TAIL > 200 & SKULL < 90)

long_tails_or_small_heads <- 
  pterosaur_data %>% filter(TAIL > 200 | SKULL < 90)
1
== is needed to filter
2
| indicates or

Dealing with NA values

# remove NAs in single column
pterosaur_data %>% filter(!is.na(ORBIT))

# remove all rows with NAs
pterosaur_data %>% filter_at(vars(2:15), all_vars(!is.na(.))) 

filter() use cases

  1. Find pterosaurs that have longer necks than humerus’
  1. Returning to the original data, remove measurements with NA TRUNK_LENGTH values, for individuals with IDs greater than 50
  1. Trim the data to only include SKULL lengths between 60 and 90mm

filter() use cases

  1. Find the individuals with the maximum and minimum tail lengths
  1. Find the individuals with tail lengths above the mean of the sampled population

mutate(): modifying existing columns

Let’s change the units of measurement to centimetres

pterosaur_data_cm <- pterosaur_data %>% mutate(ORBIT = ORBIT/10)

That only changed the values in one column - no matter, try this:

pterosaur_data_cm <- 
  pterosaur_data %>% 
  mutate(across(ORBIT:TIBIA, ~ .x/10))
1
across applies a function to all specified columns

mutate(): creating new columns

The total length of a wing is roughly the sum of the lengths of the humerus, radius, fourth metacarpal and the four wing phalanxs. With mutate(), we can calculate this and add it to the dataset:

pterosaur_data <-
  pterosaur_data %>% 
  mutate(single_wing_length = 
           HUMERUS + RADIUS + METACARPAL_4 + WING_PHALANX_1 + 
           WING_PHALANX_2 + WING_PHALANX_3 + WING_PHALANX_4) %>% 
  select(Individual_ID, single_wing_length, everything())

Conditional mutation

Can we place individuals into phenotypic classes?

Conditional mutation

pterosaur_data_age_structured <-
  pterosaur_data %>% 
  mutate(Phenotypic_class = case_when(
    single_wing_length < 300 ~ "Small",
    single_wing_length >= 300 ~ "Large",
    .default = "Unknown"))

pterosaur_data_age_structured %>% 
  select(Individual_ID, Phenotypic_class, single_wing_length)
1
For this subset of cases…
2
For a second subset of cases…
3
For all remaining cases…
# A tibble: 138 × 3
   Individual_ID Phenotypic_class single_wing_length
           <dbl> <chr>                         <dbl>
 1             1 Small                          183.
 2             2 Small                          174.
 3             3 Unknown                         NA 
 4             4 Small                          189.
 5             5 Small                          166.
 6             6 Unknown                         NA 
 7             7 Small                          164.
 8             8 Unknown                         NA 
 9             9 Unknown                         NA 
10            10 Small                          221 
# ℹ 128 more rows

Conditional mutation

It’s also possible to mutate a single row

pterosaur_data %>% 
  mutate(Sex = case_when(Individual_ID == 1 ~ "Special",
                         .default = "Ordinary"))

Build your phenotypic classes

  • Not every individual has a recorded wing length.

  • But there are other morphological traits in the dataset

  • Create a classification criteria and implement it

Phenotypic classes

Bonus content: summarise()

  • The logic of mutate() can be extended to summarise() row values
  • Rows can be grouped to summarise conditionally using the group_by function
pterosaur_data_age_structured %>% 
  group_by(Phenotypic_class) %>% 
  summarise("Wing length" = mean(single_wing_length, na.rm = T))
1
mean() has a built-in way to deal with NA values
# A tibble: 3 × 2
  Phenotypic_class `Wing length`
  <chr>                    <dbl>
1 Large                     490.
2 Small                     197.
3 Unknown                   NaN 

Your task

  1. Split pterosaurs into phenotypic classes and remove those you can’t categorise
  2. Trim the dataframe to only include class, skull, length, wing length and tail length
  3. Summarise the data to show the mean for morphological traits, for each class
  4. Convert to cm and round to zero decimal places

Focus on writing clear code, with comments (using the #) accompanying each important step.

Hint: the round() function can be used inside mutate()

Table making

Once complete, pass your polished dataframe to this function with the %>% to make a neat table

# your dataframe goes here %>% 
 pander(split.cell = 20, split.table = Inf)
1
See ?pander

Expanding our vocabulary

  • distinct()

  • slice()

  • n()

  • bind_rows()

Joins

What if we have two separate dataframes that we want to merge?

five_random_pterosaurs <- pterosaur_data %>% 
  filter_at(vars(2:15), all_vars(!is.na(.)))  %>% 
  slice_sample(n = 8)

eye_stats <- 
  five_random_pterosaurs %>%
  slice_sample(n = 5) %>% 
  select(Individual_ID, ORBIT) %>% 
  arrange(Individual_ID)

tail_stats <- 
  five_random_pterosaurs %>%
  slice_sample(n = 5) %>% 
  select(Individual_ID, TAIL) %>% 
  arrange(Individual_ID)

Joins

eye_stats
# A tibble: 5 × 2
  Individual_ID ORBIT
          <dbl> <dbl>
1             7  10  
2            10  12.5
3            26  15  
4            29  13  
5            54  20  
tail_stats
# A tibble: 5 × 2
  Individual_ID  TAIL
          <dbl> <dbl>
1             7   106
2            10   125
3            26   163
4            29   148
5            43   320

For joins to work, there needs to be some common element that links the two dataframes

left_join()

  • Add columns from dataframe y to dataframe x
  • The comment element is the Individual_ID
  • Keep all observations in x
eye_stats %>% 
  left_join(tail_stats)
# A tibble: 5 × 3
  Individual_ID ORBIT  TAIL
          <dbl> <dbl> <dbl>
1             7  10     106
2            10  12.5   125
3            26  15     163
4            29  13     148
5            54  20      NA
tail_stats %>% 
  left_join(eye_stats)
# A tibble: 5 × 3
  Individual_ID  TAIL ORBIT
          <dbl> <dbl> <dbl>
1             7   106  10  
2            10   125  12.5
3            26   163  15  
4            29   148  13  
5            43   320  NA  

inner_join()

  • Only keep rows in x that have a matching common element in y
eye_stats %>% 
  inner_join(tail_stats)
# A tibble: 4 × 3
  Individual_ID ORBIT  TAIL
          <dbl> <dbl> <dbl>
1             7  10     106
2            10  12.5   125
3            26  15     163
4            29  13     148
tail_stats %>% 
  inner_join(eye_stats)
# A tibble: 4 × 3
  Individual_ID  TAIL ORBIT
          <dbl> <dbl> <dbl>
1             7   106  10  
2            10   125  12.5
3            26   163  15  
4            29   148  13  

Build a quarto report

  • Use quarto to publish a report, documenting your code

  • Use this time to tidy your code

  • Use comments within code chunks

  • Write explanations outside of code chunks

  • See the _quarto.yml file